This report summarizes the results of the consensus clustering module, which utilizes resampling-based consensus clustering (Monti et al, 2003) to derive robust proteome clusters. 1000 bootstrap sample data sets are clustered into K clusters using k-means, and a consensus matrix is constructed whose entries (i, j) record the number of times items i and j are assigned to the same cluster divided by the total number of times both items are selected. A range of possible cluster numbers K between 2 and 10 are evaluated and the best K is determined by comparing the empirical cumulative distribution (CDF) of the resulting consensus matrices. To compare the clusterings, the increase of CDF area Kdelta is evaluated and the K with the largest Kdelta is defined as best K. Detailed documentation for the consensus clustering module can be found here. This report shows metrics comparing the clusterings used to determine the best K, the consensus matrix for the best K, principal component analysis for the best K clusters, and marker selection & GSEA results for each cluster.
Table: Best K and score for different clustering metrics.
Figure: Consensus matrix for best K = 3.
Figure: Plot of principal component analysis for K = 3. Clusters are separated by colors and shapes. Sample names are labeled for each point.
Figure: Heatmap showing marker selection results for K = 3 clusters. Columns (samples) are sorted by cluster number and rows (features) are clustered by hierarchical clustering using the Pearson correlation method. Sample annotations, including cluster number, are labeled at the top of the heatmap.
No significantly enriched pathways in cluster 1 with FDR < 0.01.
No significantly enriched pathways in cluster 2 with FDR < 0.01.
No significantly enriched pathways in cluster 3 with FDR < 0.01.